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Whether a dwarf spheroidal galaxy is in equilibrium or being 
tidally disrupted by the Milky Way is an important question for the 
study of its dark matter content and distribution. This question is 
investigated using 328 recent observations from the dwarf spheroidal 
Leo I. For Leo I, tidal disruption is detected, at least for stars suffi- 
ciently far from the center, but the effect appears to be quite modest. 
Statistical tools include isotonic and split point estimators, asymp- 
totic theory, and resampling methods. 

1. Introduction. The dwarf spheroidal galaxies near the Milky Way are 
among the least luminous galaxies in the night sky. While they have stel- 
lar populations similar to those of globular clusters, approximately 10^-10^ 
stars, they are considerably larger systems, typically hundreds, even thou- 
sands of parsecs in size compared to radii of tens of parsecs characteristic of 
clusters. They are excellent candidates for the study of dark matter because 
they are nearby and generally have extremely low stellar densities. More- 
over, due to their proximity to the Milky Way, many of these dwarf galaxies 
are also potentially strongly affected by disruptive tidal effects. The mere 
existence of dwarf spheroidal galaxies suggests these systems contain much 
dark matter since it is unclear how they could have avoided the effects of 
tidal disruption without the added gravitational force from a considerable 
reservoir of unseen matter [see Mufioz, Majewski and Johnston (2008) and 
Mateo, Olszewski and Walker (2008)]. 

Detailed kinematic studies [Wang et al. (2005), Walker et al. (2006) and 
Wang et al. (2008a)] confirm the widespread belief that the dwarf spheroidal 
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galaxies are dominated by dark matter, in many cases finding that the dark 
matter densities exceed that of visible matter by a few orders of magnitude. 
These latest studies differed from their predecessors, for example, Mateo 
et al. (1993), Mateo (1998), Mateo et al. (1998) (and references therein), 
and Kleyna et al. (2002), statistically by using a nonparametric analysis to 
estimate the distribution of dark matter. While the nonparametric analysis 
did not require a specific form for this distribution, it did assume that the 
galaxies are in equilibrium and isotropic. The purpose of the present paper is 
to explore the gravitational effects of the Milky Way on the dwarf spheroidals 
and, implicitly, to probe the underlying assumptions used in Wang et al. 
(2005), Walker et al. (2006) and Wang et al. (2008a). 

To this end, our approach is to address these issues using recent data 
[Mateo, Olszewski and Walker (2008)] for the dwarf spheroidal galaxy Leo I 
from which we obtain kinematic observations of 328 stars. Among the Milky 
Way's dSph satellites, Leo I is perhaps the most distant, at 255 it 3 kpc, and 
is receding from the Milky Way at a relatively large velocity of 179.5 it 0.5 
km/s. The combination of the large distance and high velocity led Byrd et al. 
(1994) to suggest that Leo I may not be bound to the Milky Way. Bound or 
unbound, the large outward velocity means that Leo I passed much closer 
to the Galactic Center in the past. In the preferred model of Byrd et al. 
(1994), Leo I passed within 70 kpc of the Galactic Center, a distance similar 
to the closest present day dwarf spheroidal galaxies. Recent papers [Sohn 
et al. (2007), Mateo, Olszewski and Walker (2008)] suggest more specific 
models in which Leo I passed within 10-20 kpc from the center of the Milky 
Way some 1-2 Gyr ago. 

So, the question becomes as follows: What effect, if any, did this close 
encounter have on Leo I? In some cases, a close encounter with the Galactic 
Center can change the shape of a dwarf spheroidal by producing tidal arms 
[Oh, Lin and Aarseth (1995), Piatek and Pryor (1995)]. Prominent tidal 
arms are not observed in Leo I, but this may reflect our unfavorable viewing 
angle rather than the actual lack of such features [Mateo, Olszewski and 
Walker (2008)]. A more subtle but related effect is streaming motion. The 
practical observational signal of this process arises from the fact that both 
leading and trailing stars move away from the center of the main body of 
the perturbed systems in the reference frame of that galaxy. Since stars near 
the center are more tightly bound to the galaxy than stars near the edge, 
the magnitude of streaming is expected to increase with distance from the 
center and be aligned with the apparent major axis of the galaxy. Another 
question of interest here is whether there is a threshold below which the 
magnitude of streaming motion is negligible and beyond which it is appre- 
ciable, as suggested by Mateo, Olszewski and Walker (2008). A threshold 
model is motivated by the nature of the tidal forces a dwarf galaxy experi- 
ences as it follows a highly elliptical orbit around the Milky Way. A more 
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detailed account of streaming motion may be found in Section 4.3 of Mateo, 
Olszewski and Walker (2008). 

There are several interesting statistical questions here. Is streaming mo- 
tion evident in Leo I? If so, how can it be described and estimated? To what 
extent can it be described by a threshold model, in which streaming motion 
is only present for stars at a sufficient distance from the center? We answer 
these questions within the context of a model, called the cosine model be- 
low, that incorporates the qualitative features of streaming motion described 
above (increasing with distance from the center and largest along the major 
axis). The answers may be summarized: The magnitude of streaming mo- 
tion appears to be modest, at most 6.19 km/s, but is (nearly) significant 
at the 5% level. The streaming motion does appear to be consistent with a 
threshold model, but it is difficult to constrain the threshold. This may re- 
flect the inherent "fuzziness" of such a threshold radius, plus the fact that, 
due to projection effects, stars associated with streaming motions can be 
superposed on the sky with regions of stars that do not show any streaming. 

The data are described in Section 2. In Section 3 we review the bisec- 
tor test used in Mateo, Olszewski and Walker (2008). The cosine model is 
introduced in Section 4 and used to estimate the magnitude of streaming 
motion and motivate a test for significance. Threshold models are considered 
in Section 5. Section 6 contains remarks, outlining possible extensions. The 
Appendix states some of the asymptotic results used in the paper. 

2. The data. 

The data. The data used here consist of position and velocity measure- 
ments for candidate member stars from Leo I. These were derived from 
observations using the multi-fiber Hectochelle spectrograph on the MMT 
telescope at Las Campanas Observatory during March and April of 2005, 
2006, and 2007. The raw spectra were converted to velocity measurements 
using fxcor in IRAF (the Image Reconstruction and Analysis Facility) , which 
returns a velocity measurement and an estimate of the standard deviation 
of measurement error for each star. A detailed description of the observation 
and reduction processes is included in Mateo, Olszewski and Walker (2008). 
For each star the four variables of primary interest here were line of sight 
velocity, position projected on the plane orthogonal to the line of sight, and 
the standard deviation of measurement error for the velocity. Velocities Y 
and the standard deviations S are expressed in km/s. Position is expressed 
in polar coordinates {R,Q), with R measured in arc seconds and O in de- 
grees, so defined that O = or 180 along the major axis. For Leo I, 400 arc 
seconds are roughly 500 parsecs. 
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Fig. 1. Histogram of velocities for Leo I before and after trimming. 



Trimming. A complicating feature of the data is that not all stars in 
the sample are really members of the galaxy. Some are foreground stars, 
located along our line of sight toward Leo I. Fortunately, due to Leo Ls 
large systemic velocity, the radial velocities of nonmembers are quite distinct 
from those of the galaxy itself. Velocities of the galaxy members are fairly 
tightly clustered around a well-defined and, in this case, very large positive 
velocity, while velocities of nonmembers have a much broader distribution 
centered much closer to zero heliocentric velocity; see Figure 1. To eliminate 
the foreground stars, we computed (an estimate of) the probability that each 
star is a galaxy member, using the method of Sen, Walker and Woodroofe 
(2008). For the Leo I, these probabilities were either at least 0.99 or at most 
0.01. We eliminated the stars with low probabilities and kept 328 others. 
Some descriptive statistics of the trimmed sample are presented in Table 1. 
Observe that the trimmed sample consists of stars whose velocities are within 
three standard deviations of their mean. The data from Leo I is exceptional 
in that all of the inclusion probabilities are either very high or very low. 
Data from the dwarf spheroidals Sculptor and Sextans provide examples 
with many intermediate values. In such cases, it is sometimes possible to 
include other variables, like metallicity (magnesium and calcium indices), to 
improve discrimination [see Walker et al. (2008)]. 



Selection. The data are regarded as a random sample from Leo I, but not 
a simple random sample, since some regions were sampled more extensively 
than others. Thus, the joint density of {R, @) is of the form 

(1) f{r,e)^u{r,e)g{r,e), 

where g is the density of R and within the population and u is the selection 
function. Figure 2 presents a scatter plot of {R, 0) for the trimmed sample 
from which some effects of selection may be seen: Within the population of 
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Fig. 2. Scatterplot of {R,0) for the Leo I data. 



Leo I stars, it is not unreasonable to assume that R and Q are independent 
and that Q has a uniform distribution [Mateo, Olszewski and Walker (2008)]. 
The data in Figure 2 are clearly not consistent with this assumption: Stars 
beyond 400 arc seconds divide into two groups, one centered roughly around 
6 = and the other roughly around 9 = —180. That is, for stars beyond 400 
arc seconds, the sample divides into two branches centered around the major 
axis on the two sides of the galaxy. This assumption was not intentional since 
candidate members were selected as uniformly as feasible in G over the full 
range of R shown in Figure 2. 

If we do suppose that R and are independent and that G has a uni- 
form distribution, within the population, then it is possible to estimate the 
selection function u. The marginal density of R within the population of 
Leo I stars can be estimated with some precision from the large sample of 
positions reported in Irwin and Hatzidimitriou (1995). Thus, the joint den- 
sity g of R and Q can be estimated with some precision. It is also possible 



Table 1 
Descriptive statistics for Leo I 





R 





cos(0) 


Y 


S 


min 


2.3000 


-256.3000 


-1.0000 


260.1000 


1.6000 


max 


848.5000 


99.8000 


1.0000 


311.1000 


7.6000 


med 


259.8000 


-74.2000 


0.0932 


282.6500 


2.0000 


mean 


283.3213 


-77.6591 


0.0932 


283.0927 


2.1302 


stdev 


171.9573 


100.4565 


0.7619 


9.4144 


0.6470 
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to estimate / from our selected sample, using a kernel estimate, for exam- 
ple, and then u can be recovered from (1). Calculations of this nature are 
reported in Wang et al. (2005). We do not pursue this here because most of 
our analysis is conditional on position and, so, unaffected by the selection. 

A Model. To describe the effects of streaming motion, let V denote the 
line of sight velocity of a star and suppose that, within a galaxy, R and Q 
have a joint density g and V = i^{R, Q) + £, where e is a random fluctuation 
with mean and variance u^, and e is independent of (i?, Q). Thus, iy{r, 0) is 
the expected velocity, given R = r and Q = 6. Velocity is measured with some 
error. We observe {Y, S), where Y = V + 5 and the conditional distribution of 
5 given (i?, 0,e, S) is normal with mean and standard deviation S. Thus, 
for the selected sample, 0j, 1^, Sj), i = 1, . . . , n = 328, are independent 
and identically distributed random vectors for which have density 

/, 

(2) Yi = v{R,,Qi)+ei + 5i, 

where {Ri,Qi),£i, and are independent, and the conditional distribution 
of 6i given {Ri,@i,ei,T,i) is normal with mean and variance S?. The stan- 
dard deviation a is called the velocity dispersion and is of interest because 
of its relationship to the mass distribution. Every known (to us) estimator of 
galaxy mass uses an estimate of the velocity dispersion at some level. These 
range from simple expressions, like a^/r, for a disk galaxy, to more complex 
relations, like Jeans' Equation, for pressure driven systems, like Leo I; see 
Binney and Tremaine (1987). 

For the remainder of the paper, let < r2 < • • • < r„ denote the ordered 
values of Ri, . . . , Rn, and let 6i, . . . , On, cri, . . . , cjn, and yi,. . . , the con- 
comitant order statistics of Gj, Sj, and YJ. To avoid selection effects, we con- 
dition on the position variables ri, . . . , r„, ^i, . . . , 0„ in subsequent analysis. 
Probability and expectation mean conditional probability and expectation, 
unless otherwise noted. 

3. The bisector test. An intuitive test for the presence of streaming mo- 
tion was developed in Mateo, Olszewski and Walker (2008). To begin, stars 
distant from the center, say, r > ro, were selected. There were two reasons 
for this selection: The effect of streaming motion is expected to be small for 
stars close to the center and increase with distance from it, and the sample 
divides into two quite distinct branches for stars at least 400 arc sec from 
the center; see Figure 2. This resulted in reduced samples which were then 
divided into two groups by passing bisectors through the data set, and the 
difference in average velocities for stars in the two groups were computed. 
The bisectors were of the form cos(^ — lo) = 0, where to was allowed to vary. 
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In more detail, let 

i>ro,cos(6i— ii;)<0 



(3) ^^V{oo) 



2 ' 



consider the test statistic Bq = maX(^ AoF(a;). Attained significance levels, 
0.030, 0.006, 0.014, and 0.101 for the reduced samples r > 400, r > 455, r > 
600, and r < 400, were reported in Mateo, Olszewski, and Walker (2008) 
using a permutation test. 

The idea is sound, but there are details. Supposing that i/(r, 9) = u is 
constant, so that there is no streaming motion, let 

^ Er=lj/»/(^0+^.') . .2 l^r. ^.2 21 



n 



1=1 



Then uq and iTq are \/n-consistent estimators of u and cr^. Define AiV{io) 
by (3) with af replaced by ag + af and let Bi = max^ AiV (lo) . Using Bi 
and permuting (yi, ui), . . . , (t„) in the permutation test, we obtained 
somewhat higher significance levels. Plots of AiV {uj) for the reduced samples 
r < 400 and r > 500 are shown in Figure 3. For stars at least 500 arc seconds 
from the center, AiV{uj) attains its maximum value of 10.0682 around 69 
degrees, but this spike is very slender, and is, in fact, due to the influence of 
a single star. One expects the effect of streaming motion to be large along 
the major axis of Leo I (w = or 180), and this is the case in Figure 3 and 
others like it (not included). For this reason, the test that rejects for large 
values of |Aiy(0)| is also considered. 

Table 2 shows (estimated) significance levels for Bi and |Aiy(0)| for 
several values of rg. While the significance levels are higher than reported 
in Mateo, Olszewski and Walker (2008), they still suggest that streaming 
motion in present for stars sufficiently far from the center. The dependence 
on ro is troubling, however, and the results are far from conclusive. In the 

Table 2 

Test statistics and significance levels 





Bi 


PBi 


arg max Ai V 


|Aiy(0)| 


Po 


r <400 


1.5264 


0.809 


6.88 


0.9485 


0.401 


r > 400 


5.9065 


0.210 


69.33 


4.2182 


0.108 


r > 450 


7.5152 


0.106 


69.33 


4.4935 


0.132 


r>500 


10.0682 


0.032 


69.33 


6.2498 


0.070 


r >600 


12.2756 


0.129 


69.33 


9.9049 


0.052 


r > 700 


15.0053 


0.080 


69.33 


11.2687 


0.064 


r >750 


19.4157 


0.047 


75.63 


2.8974 


0.688 
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next section we present another test which avoids the arbitrary choice of rg, 
at the expense of setting u; = 0. 

The details of the permutation test are as foUows. Consider a test statistic 
T = r(r,0,y,(T), where r = (ri,...,r„), = (6*1, . . . , 6'„), y = (yi, . . . , yn), 
and <T = (a"i, . . . , (Tn). If there is no streaming motion, then (i?, 0) and 
(y, S) are independent. In this case {yi,ai), . . . ,{yn,(Tn) are conditionally 
i.i.d. given (ri, ^i), . . . , (r„, and the conditional probability that T> 
t given {r,6) and the unordered values {(yi, ai), . . . , ct„)} is i^{iT : 
T{r, 9, Try ,na) >t}/n\, where n denotes a permutation of {!,..., n} and 
vry and irtr denote permuted versions of (yi,...,y„) and (cii, . . . , o"n). Of 
course, it is not possible to examine all n! permutations, but it is possible 
to estimate the conditional probability by sampling permutations. The sig- 
nificance levels listed in Table 2, were obtained from 10,000 permutations of 
the reduced samples. Observe that we permute the pairs (yi,(Tj). 

4. The cosine model. We now suppose that z/(r, 9) = E{Y\R = r,Q = 6) 
is of the form 

(4) u{r, 6) = iJ + X{r) cos{e), 

where u is a constant and A is a nonnegative, nondecreasing function. Thus, 
|i^(r, 0) — z^l is assumed to be nondecreasing in r and largest along the ma- 
jor axis. From an astronomical standpoint, the cosine model is justified by 
detailed simulations of streaming motions [e.g., Piatek and Pryor (1995), 
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Munoz, Majewski and Johnston (2008)]. In these studies the kinematic sig- 
nature of tidal streaming is noted to closely mimic that of disk rotation. The 
latter, due to purely geometric projection effects, is precisely modeled by a 
cosine functional form (with only very minor corrections due to projection 
effects). Thus, the cosine model is wellmotivated for a wide range of under- 
lying physical mechanisms that might produce the observed kinematics. 

Estimation. Assuming cr^ to be known, the (weighted) conditional least 
squares estimators P and A, given (r, 0,<t), minimize 



with respect to f G M and nonnegative, nondecreasing functions u. Differen- 
tiation then gives the following conditions for the least squares estimators: 



for all nonnegative, nondecreasing < < • • • < ^n. We will use these con- 
ditions with cj^ replaced by 





and 






Thus, letting Wi = 1/(ct^ -|- af) and Wn = wi-\ \- w. 





and 



(7) 



A(rfc) = max[0, A(rfc)] 



where 





Alternatively, letting = w\ cos^(0i) -|- • • • -|- ■Wyt cos^(6'fc) 
^*(*)= H WiCos(6'i)[2/i-z>], 



i:fi<t 
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and A = GCM(A*), the greatest convex minorant of A*, A(rfc) = A'(Tfc), the 
left-hand derivative of A at T^. See Robertson, Wright and Dykstra (1988), 
Chapter 1 for background on isotonic estimation. 

For Leo I, iterating (5), (6), and (7) leads to convergence to three decimal 
places after four iterations. For this data set, a = 9.0107 and v = 283.1040. 
The function A is graphed in the left panel of Figure 4. The large value at 
the right end point is almost certainly due to the spiking problem: Isotonic 
estimators of an increasing function are inconsistent at the right end point, 
where they are simply too large. Heuristically, the reason for this can be seen 
from (8): A(r„) is the maximum of an increasing number of terms (if a and 
i> are held fixed). A similar, but more severe, problem arises in monotone 
density estimation and has received more attention. See Woodroofe and Sun 
(1993) and Kulikov and Lopuhaa (2006). To eliminate spiking, we replace 
A(?'n) by 6.193, the average of the last thirteen values of A(rfc), adapting the 
suggestion of Kulikov and Lopuhaa (2006) to our context where data are 
much sparser. This limits the effect of the last observation in the subsequent 
calculations. The truncated A appears in the right panel of Figure 4. 



Confidence intervals. The asymptotic distribution of A(r) can be de- 
rived under modest conditions. Suppose that A is continuously differentiable 
around r > 0, A'(r) > 0, and let C = E[1/{(t'^ + T?)] and 

1/3 



7r 



A'(r) 



2C J f{r,0) cos"^ (6) de ■ 

Then, a (fairly) straightforward application of the Argmax theorem [van der 
Vaart and Wellner (2000)] shows that the asymptotic unconditional distri- 
bution of C„ = n^/^[A(r) — A(r)]/7r is Chernoff's distribution [Groeneboom 
and Wellner (2001)], the distribution of argminj W(t) where W denotes 
a standard two sided Brownian motion. Thus, is an asymptotic pivot. 




too 200 300 400 500 600 700 800 900 



100 200 300 400 5O0 60O 700 300 90O 



Fig. 4. Left: X before truncation. Right: X after truncation. 



STREAMING MOTION IN LEO I 



11 




Fig. 5. ASSE{500,£,) as ^ varies from to 6, with the 90% cut-off mark (dashed). 



but it is difficult to use this result to set confidence intervals for A(r), be- 
cause it is difficult to estimate A'(r) and hence the normalizing constant 
7^. Moreover, even the condition A'(r) > is suspect on the interval where 
A = 0. 

It is possible to avoid the problem of estimating 7,,, though not the con- 
dition A'(r) > 0, by adapting the likelihood based confidence intervals of 
Banerjee and Wellner (2001) to the present problem. For fixed ^0,^0 > 0, 
and £7 > 0, let 

[Vi-v -u{ri)cos{9i)f 



ASSE{ro,^o 



(9) 



min 

"(^•o)=5o - I 



a' + af 



. ^[Vi-v- u{ri) cos{Bi)f 
mm > 

1=1 



where implicitly both minimizations are over f £ M and nonnegative, nonde- 
creasing functions u. If A(ro) = ^0^ A is continuously differentiable near ro, 
and A'(ro) > 0, then A5S'i?(ro, ^0) has a limiting distribution that does not 
depend on any unknown parameters; and the same asymptotic distribution 
is obtained with a replaced by a, as a converges at a much faster rate. A 
description of the asymptotic distribution, including graphs and percentiles, 
may be found in Banerjee and Wellner (2005). In particular, (Monte Carlo 
estimates of) the 90th and 95th percentile are 1.61 and 2.29 and, for ex- 
ample, : AS'5ii'(ro, ^) < 1.61} is an approximate 90% confidence set for 
A(ro). A plot of AS'S'i?(500,^) is shown in Figure 5, and selected confidence 
intervals are listed in Table 3. 

Resampling provides still another way to set confidence intervals. Recent 
results [Kosorok (2007), Lee and Pun (2006), Leger and MacGibbon (2006), 
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Sen, Banerjee and Woodroofe (2008)], some obtained for related problems, 
suggest that direct use of the bootstrap will not provide consistent estimators 
of the distribution of sampling error but that use of a smoothed bootstrap 
or m out of n bootstrap will. Thus, let 

where K \sa, kernel and b a bandwidth. We used the standard normal density 
for K and chose the bandwidth 6 = 0.1 (subjectively) to compromise between 
smoothness and fit. The result is shown in the left panel of Figure 6. The 
right panel shows the derivative of the smoothed estimator, illustrating the 
difficulty in estimating A'(r). 

Now, let Ci denote the residuals, ei = yi — v — \s{ri) cos(0j), i = 1, . . . , n. 
Let Xi = {ai,ei), and let denote the empirical distribution of xi, . . . ,Xn- 
Further, let (5i, Zi), . . . , (iS^, Z„,) ~ be conditionally independent given 
{r,e,y,a); let 

(10) y* = i) + Xs{ri) cos{6i) + Zi and a* = Sf, 

and let A* denote the (truncated) isotonic estimator (7) computed from 
, . . . , y* and o" J , . . . , cr* with ri , . . . , r„ and 9i, . . . ,9n held fixed. To set 
confidence intervals for A(ro), we estimate the distribution of A(ro) — A(ro) 
by the conditional distribution of A*(ro) — As(ro), which may be computed 

Table 3 

90% and 95% confidence intervals for X{r) 



ASSE Bootstrap 



ro 




0.90 






0.95 




0.90 




0.95 


A 


L 


U 


CP 


L 


U 


CP 


L 


u 


L 


u 


400 





3.54 


0.901 





3.86 


0.952 





3.57 





3.57 


1.92 


450 





3.63 


0.887 





4.01 


0.943 





3.74 





3.85 


1.92 


500 


0.10 


4.50 


0.882 





5.02 


0.936 





3.58 





3.90 


1.98 


550 


0.26 


4.65 


0.852 





5.19 


0.916 





3.44 





3.76 


1.98 


600 


0.26 


6.66 


0.827 





7.30 


0.897 





3.30 





3.64 


1.98 


650 


0.36 


6.70 


0.865 


0.05 


7.39 


0.922 





3.58 





3.97 


1.99 


700 


0.36 


8.88 


0.913 


0.05 


9.56 


0.961 





4.26 





4.69 


1.99 


750 


1.85 


8.88 


0.906 


1.37 


9.56 


0.952 


0.44 


7.86 





8.37 


5.37 



Notes: The leftmost column shows the radial distance. The next two columns are lower 
and upper endpoints of an approximate 90% confidence interval computed from ASSE; 
the fourth column is a bootstrap estimate of the coverage probability; the fifth, sixth, and 
seventh columns provide the same information for 95% confidence intervals. The next four 
columns are lower and upper endpoints of approximate 90% and 95% confidence intervals 
computed from the bootstrap. The last column provides the value of A. 
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Fig. 6. Left: The smoothed (dashed) and unsmoothed (solid) estimators with 6 = 0.1. 
Right: The derivative of the smoothed estimator. 
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7. Histograms o/ 10,000 realizations o/ A* (500) — As (500) (left) and D* (right). 



from simulation. The left panel in Figure 7 shows a histogram of 10,000 
values of A* (500) — As(500). Bootstrap confidence intervals for selected ro are 
listed in Table 3. Similarly, to set confidence bands for A, we approximate the 
distribution of D = max^ \ ~ by that of D* = max^ |A*(r) — As(r)|. 
The right panel in Figure 7 shows a histogram of 10,000 values of D* . The 
90th and 95th percentiles of this distribution are 5.194 and 6.074. We have 
not standardized these variables before bootstrapping, because it is difficult 
to estimate A'. 

Unfortunately, there are major differences between the two methods for 
setting confidence intervals. To some extent, these can be explained by the 
constructions: The bootstrap intervals attempt to balance the error proba- 
bilities equally, left and right; the intervals derived from ASSE make no 
such attempt. There are more serious differences, however, between the 
asymptotic values and the bootstrap estimates. The difference can be seen 
in the left panel of Figure 7: The histogram is asymmetric, whereas Cher- 
noff's distribution is symmetric. The ASSE method depends on the ap- 
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proximations P{ASSE[ro, X{ro)] < 1.61} w 0.90 and P{ASSE[ro, X{ro)] < 
2.29} ~ 0.95, where now P denotes the unconditional probability. The boot- 
strap estimates of these probabilities, P*{ASSE*[rQ, \s{ro)] < 1.61} and 
P*{ASSE*[rQ, \s{ro)] < 2.29}, are reported in columns three and six of Ta- 
ble 3. There is good agreement for tq < 450 and tq > 700. This is important, 
because the positive lower confidence bounds on the last two lines of Ta- 
ble 3 reinforce the conclusions in Section 3 that there is streaming motion. 
But the bootstrap estimates are substantially less than the nominal values 
for 550 <r< 650, an interval that includes values for which A^(ro) is very 
small, and the positive lower confidence limits in this region should not be 
trusted. The disagreement between the bootstrap intervals and ones derived 
from asymptotic distributions is difficult to resolve, in part, because the jus- 
tification for the bootstrap is itself asymptotic and requires the condition 
A'(ro)>0. 

Testing. Within the cosine model (4) , testing for streaming motion means 
testing the null hypothesis A = 0. The positive lower confidence limits on the 
last lines of Table 3 suggest that this hypothesis can be rejected. This point 
can be made in another way that does not depend on asymptotics or even 
the validity of (4). Consider the F-like statistic 

n 

(11) J^ = J2^,cos\0,)X\ri), 

i=l 

were known, ei, . . . ,e„ were normal, Wi were replaced by Wi = 1/((T^ + af), 
and A were replaced by the isotonic estimator for known v and <t^, then —2J^ 
would be the log-likelihood ratio statistic for testing A = 0; see. Chapter 2 
of Robertson, Wright and Dykstra (1988). 

For the Leo I data set, the observed value of was 6.69. We again assess 
significance from the permutation distribution of but computed from the 
full sample {ri,6i,yi,ai),i = l,...,n = 328. In a sample of 10,000 permu- 
tations, the permuted value of T exceeded the observed value 553 times, 
roughly confirming the conclusion based on confidence intervals and asymp- 
totic calculations. The effect of truncation on the test statistic is amusing. 
Without the truncation the value of J- would have been substantially larger, 
8.72, but the significance level would have been essential unchanged, 0.0543. 

Observe that (4) was used only to motivate the form of the test statistic. 
The F-like test statistic serves also as a test of the hypothesis v{r,6) = in 
(2). 

5. Thresholds and the break point. By a threshold or breakpoint we 
mean a distance from the center of Leo I below which there is no streaming 
motion, or very little, and above which streaming motion is appreciable. This 
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threshold or tidal radius of the dwarf galaxy corresponds to the minimum 
radius where stars remain bound. Outside this radius, stars previously bound 
to the dwarf galaxy are formally on independent orbits about the Milky Way, 
though of course these orbits are generally very similar to that of the dwarf 
galaxy itself. This produces the streaming motion our method aims to detect. 
Consequently, one can expect that the kinematic streaming effects will only 
begin to appear outside the tidal radius, a feature our threshold model aims 
to mimic. Of course, projection effects and the fact that the stars of any 
galaxy exhibit a distribution of velocities mean that the threshold effect will 
not be arbitrarily sharp, but the transition from nonstreaming well inside the 
tidal radius to streaming outside this radius is quite reasonably motivated 
in an astronomical context. We consider two approaches to defining and 
estimating such a threshold, change point models and split points, as in 
Banerjee and McKeague (2007a). 

Change point models. Let r denote an upper limit for r. In the change 
point model it is assumed that there is a p > for which A(r) = for r < p 
and A(r) > for p <r <t, in which case we call p the threshold. 

We may obtain an upper confidence bound by modifying the F-like statis- 
tic (11). For a given po consider the hypothesis p> Pq. Let 

i:ri<po 

and let m be the largest integer for which rm < Po- Then the conditional null 
distribution of J^{po) is invariant under permutations of (yi, <ti), . . . , (ymi o"m)j 
so that significance can again be assessed from a permutation distribution of 
J^{po). Of course, the set of po for which the hypothesis is accepted at level 
a is a level 1 — a confidence set for p. For the Leo I data set with 1 — a = 0.9, 
this hypothesis is rejected when po = 720, which, therefore, serves as an up- 
per confidence bound. Again, the cosine model (4) was used only to motivate 
the form of the test statistic. The test just described also serves as a test of 
i/(r, 9) = v for all 9 and all r < pQ. 

Unsurprisingly, adopting an even more structured model suggests a lower 
bound. Suppose that A(r) = (5ijj(r — p), where /3 > is an unknown parameter 
and is a known function for which tp{x) = for x < and ip{x) > for 
X > 0. Thus, 

(12) yi = u + (3ip{ri - p)cos{9i) +ei + 6i 

for i = 1, . . . ,n. For a fixed p this is a simple linear regression model. Let 
(3r and Or denote the weighted least squares estimators, using the weights 
Wi = l/{a^ + af), derived from (12) assuming p = r, and let SSEr denote 
the residual sum of squares, 

n 

SSEr = Y'Wi[yi -Vr - M{ri - r)cos{9i)f. 

i=l 
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Then the LSE p oi p minimizes SSE^- with respect to r. Figure 8 shows 
graphs of SSEj. — SSE^ for two choices of ip, ip{x) = l(o,oo)(3^) and ipi^x) = 
max{0,x). The latter choice leads to the segmented regression model con- 
sidered in Hinkley (1971), Feder (1975), and recently in Huskova (1998). 
For this choice it may be shown that SSEp — SSEp has a limiting dis- 
tribution, assuming (12). So, {r-.SSE^ — SSEp < c} is an asymptotic level 
P[Xi ^ c] confidence set for p. For Leo I, the 90% asymptotic confidence 
set [0,654.5] U [823.1,848.5] so obtained is disconnected, but there are only 
four stars for which 823.1 <r < 848.5, and this interval is of little interest. 
Letting il^lx) = l(o,oo)(^) leads to (a minor variation on) the classical change 
point problem. The asymptotic distribution of SSEp — SSE^ may be ob- 
tained for this case too, but it is complicated and unnecessary in the sense 
that SSEr — SSEf) rises and falls so sharply near r = 333.5 and r = 702.7. 

Split points. It is possible to define and estimate a breakpoint without 
assuming that A(r) is actually equal to for small r, by fitting a stump model 
,-] to A, as in Banerjee and McKeague (2007a). This is accomplished by 
minimizing 

K(6,r)= r \^{s)h{s)ds+ r[X{s)-bfh{s)ds, 

Jo Jr 

with respect to h and r. Here /i is a positive continuous weight function. 
We used /i = 1 in Figure 9. Another possibility is to let h be the marginal 
density of i?, which is known from Irwin and Hatzidimitriou (1995). The 
minimization with respect to 6, of course, is simple, and 
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where H{r) = Jq h{s) ds and A(r) = /q X{s)h{s) ds. We define the break point 
7 to be the minimizing value of r, or the infimum of minimizing values if 
there is more than one. If A is continuous and A(0) < A(T)/[2i?(r)], then 

[A(r)-A(7)] 
2[H{r) - H{^)] ■ 

Observe that if there is a threshold p [for which A(r) = for r < p], then 
7 > /3, because A(r) = for r < 7. 

The simplest way to estimate 7 is to replace A by A in the definition. The 
left panel of Figure 9 shows a graph of KQQ{r), where kqo denotes kq with A 
replaced by A and rescaled to take values between and 1. That is, letting 
kq denote kq with A replaced by A and 7 a minimizing value of kQ{r), 

^ , N «^o(r) -«;o(7) 

i^m{r) = —- — — . 

max^ Ko{s) - Ko(7) 

With /i = 1, the estimated break point is 353.5 arc sec, but there is a near 
minimum at about 700. 

Asymptotic theory is not uninteresting from a theoretical point of view, 
but provides little useful guidance here. As described in the Appendix, the 
asymptotic unconditional distribution of 7 depends on A' (7) and would have 
to be approximated by simulation in any case. A bootstrap procedure does 
provide some guidance, however. Define yl and a* as in (10); let 7^ denote 
the value of 7 obtained by replacing A by A^ ; and let 7* and Kqq denote the 
values of 7 and kqo respectively, computed from y|, . . . , y*, erf, . . . , cr* , with 
ri, . . . , r„,, ^1, . . . , 0„ held fixed. Then the conditional distribution of ^^0(75) 
provides an estimate of the distribution of koo(7)- A histogram of 10,000 
values of Kqq(7jj) is shown in the right panel of Figure 9. The 90th and 95th 
percentiles of this distribution are 0.3694 and 0.4690. So, for example, the 
set of r for which koo('") ^ 0.3694 is a 90% bootstrap confidence set for 7. 
Unfortunately, this is a large interval, [91.1,734.3]. 




Fig. 9. Left: ko()('')- Right: Bootstrap estimate of the distribution o/koo(7)- 
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6. Some remarks. 

1. The conclusions regarding the presence of streaming motion have to be 
tentative, because of the large significance levels. One of the key factors 
behind this is the comparatively small sample size (n = 328) and, in 
particular, the very few observations far out from the center of the galaxy, 
the region of interest. In fact, we just have 64 data points above 400 arc 
seconds. We hope that with more data in the future our methods can 
be used more effectively to draw stronger conclusions. We also expect 
to obtain data on other dwarf spheroidal galaxies, for example, Draco, 
Fornax, etc., and will be applying variants of our methods to analyze the 
samples. 

2. With more data we can resort to more flexible modeling. For example, 
instead of simply using cos 6*, we could model the effect of angle 6, by 
a function h{cos9) or h{cos{6 — uj)), where h is nondecreasing and lj is 
an unknown parameter representing the direction of tidal streaming. The 
velocity dispersion parameter, o"^, has been assumed to be constant in our 
approach. For Leo I, there is evidence for such an assumption [see Mateo, 
Olszewski and Walker (2008)]. For other galaxies, it is conceivable that a 
may depend on r, the radial distance from the center of the galaxy. Our 
methods should be adaptable to this heteroscedasticity, but may require 
nontrivial extensions. 

3. Monotone regression splines provide a method for combining monotonic- 
ity constraint and smoothness. These were used effectively in Wang et al. 
(2005, 2008a, 2008b) and could be investigated in the present context. 

4. The smoothed bootstrap was invoked at several places in this paper for 
purposes of uncertainty assessment — for example, in constructing point- 
wise and simultaneous confidence bands for A, and confidence sets for the 
split point 7. While there is evidence [see Kosorok (2007), Sen, Banerjee 
and Wellner (2008) and Leger and McGibbon (2006)] that the smoothed 
bootstrap provides consistent estimates of pointwise confidence sets for A, 
the use of this method for approximating the distribution of D and that 
of ^00(7) remains to be vindicated. In particular, little is known about 
the limiting behavior of D or koo(7)- An alternative to the smoothed 
bootstrap would have been to use subsampling or the m out of n boot- 
strap. 

5. The asymptotic distribution of the least squares estimate of 7 in the 
split point model, stated in the Appendix, is curious in the sense that (a 
multiple of) Chernoff 's distribution no longer arises and is replaced by a 
nonstandard limit. We know of no other situations in the published liter- 
ature where this limit distribution has been encountered. Furthermore, it 
does not seem possible to represent the limit as a multiple of a universal 
distribution, which renders the computation of quantiles difficult. 



STREAMING MOTION IN LEO I 



19 



6. An interesting but difficult problem is to estimate the threshold parame- 
ter p nonparametrically, assuming only that A(r) = for r < p, A(r) > 
for r > p and that A is increasing. We expect the rate at which p can 
be estimated to depend crucially on the smoothness of the join between 
the two segments of the function at p; the smoother the join, the slower 
the convergence. The intuitive estimator inf{t : A(t) > 0} underestimates p 
heavily. One suggested modification is to replace by a positive threshold 
that decreases to at an appropriate rate. Yet another approach would 
be to construct a penalized least squares estimate of A under monotonic- 
ity constraints, where one penalizes monotone functions with low values 
of the threshold parameter. 

7. Yet another way of estimating 7 is to observe that {v,(3,^) = 
argmin^j, M(w, 6, r) and 



\{v,h,r) = E 



{Y -v-bl{R>r)cosQf 



(/.(i?,e,S)_ 

with 6, S) = f{R, Q){a'^ + T?) / h{R) . We can approximate this crite- 
rion function M, by the empirical expectation, and construct an estimate 
of 7 as the threshold that minimizes the sample analogue. This method 
avoids the estimation of A. However, it needs knowledge of which, 
in turn, involves estimation of /. This is not feasible with the currently 
available sample size, so we have not explored this approach in the paper. 



APPENDIX 

The Appendix contains descriptions of two asymptotic distributions that 
arise in the paper. The first of these may be derived by a straightforward 
adaptation of the arguments in Banerjee (2007) and Banerjee and Wellner 
(2001). The derivation of the second will appear elsewhere. 



The residual sum of squares statistic A.SSE{ro,^o). For a real- valued 
function / that is bounded below on M, let gcm(/,/) and slogcm(/, /) de- 
note the greatest convex minorant (GMC) and the left-hand slope of the 
GCM of the restriction of / to the interval /. We abbreviate gcm(/, M) and 
slogcm(/, M) by gcm(/) and slogcm(/) and let 

slogcm°(/) = (slogcm(/, (-cx),0]) AO)l(_oo,o] 

+ (slogcm(/,(0, 00)) V 0)1(0,00) • 

For positive constants c and d define the process Xc,rf(i) = cW{t) + dt^, 
where W is, standard two-sided Brownian motion starting from 0. Let gc,d = 
slogcm(Xc,rf) and ^ = slogcm^ (Kc^d) ■ Thus, 51,1 and gfi are the uncon- 
strained and constrained versions of the slope processes associated with the 
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"canonical" process Xi^i. For details about the processes and g^^, see 
Banerjee (2007) and Banerjee and Wellner (2001). Set 

^c4 = j[{9cA^)?-{9%{u)Y]dn 

and abbreviate Di^i to D. Then B^^^^ has the same distribution as c^B [see 
Banerjee (2007) and Banerjee and Wellner (2001)]. With this notation, if 
A(ro) = ^0) A is continuously differentiahle near tq, and A'(ro) > 0, then 
ASSE{rQ,^Q) converges in distribution to D. This may be established by 
adapting the arguments of Banerjee and Wellner (2001) to the cosine model. 



The split point estimation procedure. Recall the definitions of the break- 
point 7 and the best fitting /3 from Section 5 along with those of H{r) = 
/J- h{s) ds and A(r) = X{s)h{s) ds. Thus, /3 = [K{t) - K{^)]/[H[t) - H{^)]. 
Next let Gc,d = gcm(Xc,d), 



V- 



and 



£;[(ct2 + S2)-i] / cos2(0)/(ro, 9) dO ' 



2[H{t)-H{^)] 
-2A(7)/i(7) 



-2A(7)/i(7) 
4A(7)A'(7)/i(7) 



:A'(ro), 



Q(tl, t2) = 2(3h{^)[Ga,b{t2) - Ga,6(0) " W 



+ \t'Vt, 



with t = (ti,t2)- Now let 7 and (3 denote the estimators of 7 and (3 obtained 
by replacing A by A. /f (i) 7 is the unique minimizer of kq in (13), (ii) A 
is continuously differentiahle near-y, A' (7) > 0, and (iii) (/3,7) converges to 
at the rate n~^/^, then n^/^(/3 — 7 — 7) converges in distribution to 
argmin^Q(t). Here assumptions (i) and (ii) seem quite reasonable and even 
necessary in the sense that A' (7) appears in the asymptotic distribution. 
Assumption (iii) is inelegant. It is suggested by Banerjee and McKeague 
(2007b), but a complete proof is still lacking in the present context. 

The asymptotic distribution of n^/^{^ — 7) may be simplified by first 
minimizing Q(ti,t2) with respect to ti. After some algebra, it is found to 



be the distribution of argmin^^ [ 
and 



■,{t2) + Cti] , where Pa,fe(t2) = Ga,b{t2) 



htl 



1 

c = - 
2 



A'(7)4( 
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